RamaDA: complete and automated conformational overview of proteins 
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The tertiary structure of protein, as well as the local secondary structure organization are fully 
determined by the angles of the peptidic bound. The backbone dihedral angles not only determine 
the global fold of the protein, but also the details of the local chain organization. Although a 
wealth of structural information is available in different databases and numerous structural biology 
softwares have been developed, rapid conformational characterization remains challenging. 

We present here RamaDA, a program able to give a synthetic description of the conformation of 
a protein. The RamaDA program is based on a model where the Ramachadran plot is decomposed 
into seven conformational domains. Within the framework of this model, each amino-acid of a given 
protein is assigned to one of these domains. 

From this assignment secondary structure elements can be detected with an accuracy equivalent 
to that of the DSSP program for helices and extended strands, and with the added capability of 
detecting PolyProline II secondary structures. Additionally, the determination of a z-score for each 
amino-acid of the protein emphasizes any irregularities in the element. 

It is also possible to use this analysis to detect characteristic conformational patterns. In the case 
of EF-hands, calcium-binding helix-loop-helix domains, it is possible to design a strict consensus for 
the 9 amino-acids of the loop. 523 calcium binding protein files can be found into the entire PDB 
with this pattern and only 2.7% false positive hits are detected. 

The program RamaDA gathers several tools in one and is then able to give a complete information 
on a protein structure, including loops and random coil regions. Through the example of EF-hands, 
a promising approach of structural biology is developed. RamaDA is freely available for download 
as well as online usage at |http : / /ramada . u- strasbg . f r] 

PACS numbers: 



Background 

A protein is a hierarchical molecule, with a structure 
tha is organized in primary, secondary and tertiary struc- 
tures. However, the tertiary structure, as well as the local 
secondary structure organization, are fully determined by 
the angles ((^, iji) of the peptidic bound. The backbone 
dihedral angles not only dictate the global fold of the pro- 
tein, but also the details of the local chain organization. 

The Ramachandran plot, first proposed in 1965 by Ra- 
makrishnan and Ramachandran [1] is a tailor-made tool 
to study the conformations adopted by amino-acids. This 
plot uses the dihedral angles (/? et ^ to indicate if a spe- 
cific pair is sterically allowed and/or which conforma- 
tional domain is adopted pflU]. Allowed (or favoured) 
regions of this space have been associated with regular 
secondary structure elements such as a-helix or /3-sheet, 
while empty disallowed regions have been highlighted. 

Since Ramakrishnan and Ramachandrans initial work, 
several conformational domains have been identified in 
the allowed regions of the plot (3HS]. In the literature, 
one can find the extended region that can be split into the 
/3-sheet and the PolyProline-II (PPII) domains [B] , the a 
domain corresponding to right-handed helical conforma- 
tions, the 7 domain corresponding to a specific conforma- 
tion of hydrogen bonded 7-turns [7], the C, domain, which 
is exclusively composed of conformations of amino-acids 



preceding a proline [5], the domain corresponding to 
left-handed helical conformations and the PPIIij domain, 
sometimes noted Ppr |8], corresponding to right-handed 
PPII helical conformations. The existence of these con- 
formational domains is only based on sterical hindrance 
and do not take into account any other parameter or ex- 
ternal force. 

The amount of structural information available in 
databases such as the Protein DataBank (PDB) ^ has 
increased much faster than the number of programs ana- 
lyzing it. Actually, few programs and databases can give 
accurate local information on proteins in the PDB [TD1-[T5] 
and it remains a challenge to get this information. How- 
ever, given the wealth of structural information available 
in the PDB, it is possible to develop a statistical model 
of the Ramachandran plot. From this model, we have de- 
veloped a program called RamaDA (for Ramachandran 
Domain Analysis). 

The RamaDA program takes into account all the co- 
ordinates found in the analyzed file, including the differ- 
ent models of the same protein created with NMR con- 
straints, in order to assign a conformational domain to 
each amino-acid of a protein. This assignment leads to 
the detection of putative secondary structure elements 
and may be used to find specific conformational pat- 
terns in the entire PDB. The latter will be presented 
here through the example of EF-hands. These domains 
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are composed of two helices separated by a 9-amino-acids 
loop known to bind calcium. They are important for sig- 
nal transduction and muscle contraction [M]. 



Implementation 

RamaDA is programmed in the python 
(www.python.org) programming language, and em- 
ploys the Biopython library [10]. The online version of 
RamaDA is hosted on an Apache server. An equivalent 
standalone version is also freely downloadable. Both 
take a protein structure file or a conformational pattern 
(see below) and and provide a graphic output of the 
analysis. 



Statistical model 

Lovell et al. [5] proposed a set of 500 protein struc- 
tures extracted from the PDB to be representative of 
the statistical distribution of the {tp, ip) angles in the Ra- 
machandran plot. To this set, we added updated struc- 
tures (PDB:1XFF, IGOK, 1E70 and 1IG5), but kept one 
obsolete structure (PDB:1A1Y) in the list. This refer- 
ence dataset contains 110 018 amino-acids and is referred 
to throughout this manuscript as topSOO. This reference 
set was split into four subsets : glycines (Gly), amino- 
acids preceding a proline (pre-Pro), prolines (Pro) and 
the others (dataset called General). 

The seven conformational domains composing the Ra- 
machandran plot that have been previously described 
in the literature (namely R-helices, L-helices, /?, 7, C, 
PPII and PPIIi,;) were fitted by a set of 2D-Gaussian 
functions cyclically defined over the complete periodic 
[-180°, 180°] X [-180°, 180°] domain. Five parameters 
are necessary to describe each 2D-Gaussian : the posi- 
tion of the centre {(fcentre, ipcentre), the standard devia- 
tions along both axes of the 2D-Gaussian {a^p', cr^Oi 
the angle made by the ijj axis of the Ramachandran plot 
and the major axis of the 2D-Gaussian.. These parame- 
ters were first determined manually for each domain and 
then fitted to the top500 distribution assuming a Poisson 
noise. 

The statistical model of the Ramachandran plot imple- 
mented in RamaDA is composed of a set of 2D-Gaussian 
scaled to 1 (see Figure [T]) and defined by the parameters 
found with top500. These parameters are gathered in 
Table m 



Assignment and z-score calculation 

This set of 2D-Gaussian functions is used as a descrip- 
tion of the backbone angle statistical distribution over 
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FIG. 1: Ramachandran plot for top500. Each dot represents 
an amino-acid. Seven domains are presented: (3 in red, PPII 
in yellow, R-helices in blue, L-helices in violet, 7 in green, ( 
in black and PPII_r in orange. 



Domain 




V^c) 




a^i 


Angle 


R-helices 


(-63.07°, 


-42.23°) 


3.54° 


5.77° 


-36.31° 




(-62.15°, 


-28.74°) 


12.00° 


4.69° 


-61.76° 




(-83.72°, 


-16.01°) 


30.22° 


10.95° 


-56.15° 


L-helices (Gly) 


(82.38° 


, 6.89°) 


7.55° 


20.63° 


-32.81° 


(pre-Pro) 


(47.06° 


, 5.93°) 


5.58° 


6.79° 


-27.38° 


(General) 


(56.35°, 


39.04°) 


4.86° 


15.18° 


-25.00° 




(-119.12° 


136.48°) 


15.77° 


29.98° 


-52.51° 


PPII 


(-68.03°, 


144.89°) 


9.65° 


16.67° 


-30.37° 


7 


(-84.90° 


69.28°) 


5.85° 


10.82° 


-6.51° 


c 


(-130.46° 


, 76.31°) 


5.90° 


12.80° 


12.25° 


PPIIfl 


(76.13°, 


-162.12°) 


11.75° 


41.02° 


-29.27° 



TABLE 1: List of 2D-Gaussian functions needed for the sta- 
tistical model of the Ramachanndran plot and their parame- 
ters 



the [-180°, 180°] X [-180°, 180°] space. A particular con- 
formation (ip, ip) is given a the probability of belonging 
to each of the seven conformational domains. The most 
probable conformational domain is then assigned to this 
amino-acid at this location. 

In the case of an NMR structure, which presented as 
a set of models, the same amino-acid can be assigned 
to various conformational states. It is considered to be 
random- coil if it is not found in the same domain for 
more than 65% of the structures in the ensemble. Simi- 
larly, a residue is considered to be extended if the /3 and 
PPII conformations are found in more than 65% of the 
structures. 

For helices (respectively, /3-strands and PPII helices), 
stretches of more than three residues assigned to the R- 
helices (respectively, /? and PPII) domain are indicated 
as putative secondary structure elements. Additionally, 
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when a residue in the PPII conformation (respectively 
/3) is found surrounded by 4 other amino-acids assigned 
to the /3 conformation (respectively PPII) the secondary 
structure assignment is extended over this residue. In 
order to highlight the potential secondary structure el- 
ements, RamaDA graphically displays indications of he- 
lices, /3 strands and PPII helices along with the associated 
sequence. A typical output of the RamaDA program is 
shown in Figure [2^ for the Pseudomonas putida benzoyl- 
formate protein (PDB:1BFD). 

For each amino-acid conformation, a z-score is calcu- 
lated as the distance from the centre of the assigned con- 
formational domain related to the 2D-Gaussian standard 
deviations. The probability P of a conformation is linked 
to the z-score Z by the following equation : 

F = e-^ (1) 

Formally, Z can be positive or negative, but the dis- 
tance to the centre of the gaussian function is the only 
relevant parameter, a positive value of Z is always found. 
The z-score of a complete structure is the mean z-score 
of all the residues. A second z-score is calculated for all 
the amino-acids except glycines. With these definitions, 
the mean z-score of a statistical series following a ID- 
normal law is 0.7979 and 1.253 for a 2D-normal law. As 
a consequence, one expects the mean z-score computed 
for a protein structure to be of the order of 1.25. A larger 
value would indicate too many departures from the ideal 
geometry, while a smaller value would be the sign of too 
stringent constraints. 



Conformational pattern recognition 

The RamaDA program was applied to the entire PDB, 
and a database containing the conformational assignment 
of each entry was made. This database is provided with 
the RamaDA program. An updated version is provided 
weekly and can be downloaded from the website. 

To look for a conformational pattern in this database, 
the RamaDA program uses regular expressions and gives 
the position of each result in the concerned protein. A 
scan over the entire PDB takes only several seconds on a 
desktop machine. 

Patterns are described as a character string, using the 
syntax of regular expressions. Conformational domains 
are represented by a letter : H for R-helices, B for /3, 
P for PPII, L for L-helices, G for 7, Z for C and Q for 
PPII/j. Conformational states extended and random-coil 
are represented by e and R respectively. 



Results and discussion 
Secondary structure elements determination 

The RamaDA program was applied to the top500 set 
of proteins, and compared to the assignment given by the 
DSSP program [TS]. Table [2] gathers the results of this 
comparison. 

It can be seen that RamaDA finds a vast majority of 
the secondary structure elements found by DSSP. He- 
lices and extended strands found by DSSP are assigned in 
nearly 95% of the cases to their respective conformational 
domains by RamaDA. This is not surprising because sec- 
ondary structure elements and conformational domains 
are strongly linked. On the other hand, only about 72% 
of the residues (respectively 65%) found by RamaDA 
to lie in the helical domain (respectively /3-domain) are 
described by DSSP as belonging to a helical secondary 
structures (respectively extended). This comes from the 
fact that DSSP relies not only on backbone angles, but 
also on hydrogen-bound patterns which are not analyzed 
by RamaDA. Short structural elements such as turns or 
loops and irregular regions are nevertheless constituted 
of amino-acids lying in regular conformational domain 
and are detected as such. 

DSSP can also detect conformational irregularities in 
otherwise regular secondary structures. Figure [3] illus- 
trates this fact by presenting the Ramachandran plot of 
the helices found by DSSP on top500. It can clearly 
be seen that while most of the amino-acids assigned by 
DSSP to a helical secondary structure lie in the helical 
conformational domain, many are scattered in all the ac- 
cessible regions. In contrast, all the residues assigned 
by RamaDA as being in a helical domain, lie inside the 
central helical domain of the Ramachandran plot. 

Finally, DSSP does not attempt to analyze PolyProline 
II secondary structure (PPII), and most of these struc- 
tures are classed as "no assignment" (Table [2]). PPII 
is an extended regular secondary structure characterized 
by a pitch of three residues per turn and strong sequen- 
tial side-chain contacts. There is however no specific 
H-bound involved in the stabilization of the structural 
pattern. 

In the PDB dated September 22'"^, 2011, 35715 chains 
of more than 5 residues and 590 chains of more than 10 
residues were assigned a PPII conformational domain by 
the RamaDA program. While a comprehensive analysis 
of the 590 segments has not been attempted, a rapid 
overview of some of the domains has confirmed the PPII 
assignment. Such an example is shown in Figure [2]d), 
where a 14 residues long canonical PPII helix was found. 
This segment is not assigned to any secondary element 
in the sequence analysis given in the corresponding PDB 
entry, but is described as a PPII helix by the authors 
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a) IBFD 

Chain A 
z-score : 1.21 

L_ - — I m. - — J - 1-1 _ . J I ■_ _ 

ASVHGTTYELLRRQGIDTVFGNPGSNELPFLKDFPEDFRYILALQEACVVGIADGYAQASRKPAFIKLHSAAGTGNAMGALSNAWNSHSPLIVTAGQQTR 

mmmm — * jwww /jwwwwvww > mimimmm — > 

OFHHHHHHHHHHHH .BHBBBBB BHHHHHHH ^ HH BBB BHBHHHHHHHHHHHHHHH >. B BBBBBB^jHHHHHHHHHHHHHHHHHL GBBBBB^'BF L'H 
20 40 60 80 100 



I J I _ I. - 1 1 I .jj _• ■_ I 

AMIGVEALLTNVDAANLPRpLVKWSYEPASAAEVPHAMSRAIHMASMApQGPVYLSVPYDDWDKDADPQSHHLFDRHVSSSVRLNDQDLDILVKALNSAS 

lAAAAA Mm — > #iAA/WVWWWW > i/WW MMAAA/^ ~» AAAAAAAAAAAW 

HHHHHLPPLPB; LHHHHH HHBHBBBB HBHHHHHHHHHHHHHHHH B BBBBB HHHHH HHHHHHHH BBBHGBB HHHHHHHHHHHHH H 
12 140 160 180 200 



MPAIVLGPDVDAANANADCVMLAERLKAPVWVAPSAPRCPFPTRHPCPRGLMPAGIAAISQLLEGHDVVLVIGAPVFRYHQYDPGQYLKPGTRLISVTCD 

>JWW lAAAAAAAAAAW — » AAWi/VW > > 

ZPBBBBBHHHHHH HHHHHHHHHHHH^B BBB _ HBBHB HH HH H..:B l-BHHHHHHHH BHBBBBB B BB H B B HB LBBBBBBBHB 
220 240 260 280 300 




PLEAARAPMGDAIVADIGAMASALANLVEESSRQLPTAAPEPAKVDQDAGRLHPETVFDTLNDMAPENAIYLNESTSTTAQMWQRLNMRNPGSYYFCAAG 



mm — > JWiWJWwv "v'-vvvvvv — . iAaaaaaaa/w — > jwwwwv^ 

HHHHHH HB BBBB HHHHHHHHHHH B H^^ '"''P '^'^■^B-HP ^ HHHHHHHHHHH HH BBBBHBHHHHHHHHHHBBBHB ' HBBB HHL 
320 340 360 380 400 



GLGFALPAAIGVQLAEPERQVIAVIGDGSANYSISALWTAAQYKIPTIFVIMNNGTYGALRWFAGVLEAENVPGLDVPGIDFRALAKGYGVQALKADNLE 



mmmm — 

LHBHHHHHHHHHHaHHB BBBB 
420 



Mmmmmm 



Mwmm 



BHBHHHBHHHHHBHHHHLBG BBBBGBBLBCHHHBHHHHHH . BB ..B 
440 460 



BBHHHHHHHHLB bbbbhbhh 
-480 500 



QLKGSLQEALSAKGPVLIEVSTV 

mtmmj' > 

HHHHHHHHHHH HB BBBBBBBC 
520 



Total z-score : 1.21 

Total z-score without glycines : 1.19 




FIG. 2: a) Output of RamaDA for PDBilBFD. The first line shows the histogram of z-scores. Green is used for z-scores lower 
than 1, yellow for z-scores between 1 and 2, orange for z-scores between 2 and 3 and red for z-scores higher than 3. The second 
line corresponds to the protein sequence. Cis-prolines are detected thanks to their oj dihedral angle and indicated by p. The 
third line indicates putative secondary structure elements (blue waves for helices, yellow waves for PPII helices and red arrow 
for P strands). The last line gives the conformational domains assignment (H for R-helices, B for /3, P for PPII, L for L-helices, 
G for 7, Z for ( and Q for PPII^j). b) 3-dimensional structure of the same protein. In the black box is highlighted the longest 
PPII helix found by RamaDA for this protein. This protein segment is indeed a PPII helix. 



5 



RamaDA DSSP 


presence 


domain analysis 


percentage 


H a helix 


64.0 % 


TT helix 


<0.1 % 


3io helix 


7.8 % 


extended strand 


1.4% 


no assignment 


4.8 % 


others 


22.0 % 


B a helix 


<0.1 % 


TT helix 




3io helix 


<0.1 % 


extended strand 


64.9 % 


no assignment 


23.1 % 


others 


12.0 % 


P a helix 


<0.1 % 


TT helix 




3io helix 


0.4 % 


extended strand 


15.6 % 


no assignment 


61.3 % 


others 


22.7 % 


DSSP RamaDA 


presence 


analysis domain 


percentage 


a helix H 


99.7 % 


B 


<Q.l % 


P 


<Q.l % 


others 


0.3 % 


TT helix H 


96.8 % 


B 




P 




others 


3.2 % 


3 10 helix H 


93.4 % 


B 


0.1 % 


P 


1.2 % 


others 


5.3 % 


extended strand H 


3.1 % 


B 


85.1 % 


P 


9.5 % 


others 


2.3 % 


no assignment H 


24.2 % 


B 


34.0 % 


P 


42.0 % 


others 


11.6 % 



TABLE 2: Comparison between assignments for secondary 
structure elements on top500. 

Example of EF-hands 

In this section, we use the conformational analysis 
provided by RamaDA to rapidly search the PDB for 
conformational patterns. Many calcium-binding pro- 
teins contain a characteristic structure, called EF-hand, 
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FIG. 3: Ramachandran plot for top500 amino-acids assigned 
to helices according to DSSP 



which is composed of a calcium-binding domain of 9 
consecutive residues, flanked by two a-helices. An en- 
semble of NMR structures for 4 EF-hands containing 
proteins was extracted from the PDB. These struc- 
tures arc those of calerythrin (PDB:1NYA), calmodulin 
(PDB:2BBM), parvalbumin (PDB:1RJV) and the human 
cardiac sodium channel NaVl.5 (PDB:2KBI). As these 
proteins contains one to four EF-hand patterns, this set 
contains 10 EF-hand unique sequences. 

The conformation assignments of the 10 calcium- 
binding loops were computed and compared and a con- 
sensus pattern was established. Table [3] compares three 
different consensus : amino-acids sequence, DSSP and 
RamaDA. 

It clearly appears that RamaDA offers a simple con- 
sensus conformational pattern. 

The search for EF-hands domains was performed 
for the consensus pattern, flanked by two R-helices of 
at least 7 residues. The consensus pattern used was 
H{7,}.HH[LQ]HLBeeH{7,} (using the regular expression 
syntax, where a dot matches everything, A{x,} means A 
repeated at least x times, [AB] means A or B and using 
the nomenclature defined in the implementation section) . 
Using this pattern, 537 hits were found in the PDB dated 
September, 22"^^ 2011. 

Each one of these hits was manually analyzed to con- 
firm the presence of a EF-hand domain, using either di- 
rect description given by the authors or known calcium 
binding activity. 523 hits were true positive, meaning the 
chosen pattern allows the detection of EF-hand domains 
with a 97.3% accuracy. 

The chosen consensus pattern may appear too strict, 
especially concerning the flanking helices. As it was 
stressed previously, some amino-acids of helices may not 
adopt a R-helices conformational domain. Then, the he- 
lices' boundaries may be different than expected. 
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Ca-binding 


RamaDA 


DSSP 


proteins 


domain 


analysis 


analysis 


calerythrin (PDB:1NYA) 


DFDGNGALE 


LHHLHLBBB 


-SS— SSB- 




DKNADGQIN 


BHHLHLBBP 


-SS— SEEE 




DTNGNGELS 


BHHLHLBBP 


-TT-SSEEE 


calmodulin (PDB:2BBM) 


DKDGDGTIT 


BHHQHLBBB 


-SSSS— B- 




DADGNGTID 


BHHLHLBBB 


-SS-SSSB- 




DKDGNGYIS 


GHHQHLBBB 


S-SSSSSB- 




DIDGDGQVN 


BHHQHLBBB 


-SSS-SSB- 


parvalbumin (PDB:1RJV) 


DKDKSGFIE 


RHHLHLBBB 


-TT-SS-B- 




DKDGDGKIG 


GHHLHLBPB 


-SSSSSSB- 


cardiac sodium channel (PDB 2KBI) 


DPEATQFIE 


ZHHLHLBBP 


-TT— SEEE 


consensus pattern 


DX[DNE] [GAK]X[GQ]X[ILV]X 


.HHLHLBee 


[-S] [-ST] [ST] [-S] [-S] [-S] [-SE] [BE] [-E] 



TABLE 3: Ensemble of 10 EF-hand structures from 4 different EF-hand containing proteins : sequence, RamaDA assignement 
and DSSP analysis. X et . correspond to any value, values between square brackets show the different possibilities for a same 
amino-acid and e is equivalent to [BPe]. 



The pattern H{6,}. .HH[LQ]HLBeeH{7,} leads to the 
detection of three more PDB files that are also true pos- 
itives. However, the pattern H{7 , } . HH [LQ] HLBee . H{6 , } 
leads to the detection of three true positive hits out of 15 
additional files, letting the accuracy drop to 96.2%. 

To compare these results to others, a search for the 
amino-acid sequence consensus was performed with PAT- 
TINPROT [TT] over the entire PDB. 932 PDB files can 
be found and only 224 of them are also found by Ra- 
maDA meaning that more than 300 files found thanks to 
the conformational pattern are missed with the amino- 
acid sequence pattern. Moreover, a vast majority of the 
files detected by PATTINPROT only are false positives. 
As for the DSSP consensus, no search was performed as 
it is obviously too large to give accurate results. 



Acknowledgements 

We would like to thank the CNRS and NMRtec for 
their financial support, Roland Stote and Bruno Kieffer 
for fruitful discussions and Claude Ling for the techni- 
cal support on the website. This study was partially 
supported by funds from the Agence Nationale de la 
Recherche (ANR-07-PROTANIN) 

List of abbreviations 

RamaDA: Ramachandran Domain Analysis ; PDB: 
Protein Data Bank 



Conclusion 

The description of a protein given by RamaDA is a 
synthetic view of the local protein chain organization. It 
provides an accurate detection of secondary structure el- 
ements and also local patterns adopted by amino-acids. 
This information is rapidely obtained, and was made 
available on internet as well as a standalone applica- 
tion. Moreover, the example of EF-hands shows that the 
use of RamaDA enhances conformational pattern recog- 
nition. Through this example, a promising approach of 
structural biology is developed. It is also interesting to 
note that, thanks to its simple usage and its fast results, 
RamaDA can be applied on large sets of structure files 
including, for example, those created via multiconforma- 
tional analysis. 



[1] C. Ramakrishnan and G. N. Ramachandran, Biophys J 
5, 909 (1965). 

[2] J. T. Edsall, P. J. Flory, J. C. Kendrew, A. M. Liquori, 
G. Nemethy, G. N. Ramachandran, and H. A. Scheraga, 
J Biol Chem 241, 1004 (1966). 

[3] G. J. Kleywegt and T. A. Jones, Structure 4, 1395 (1996). 

[4] S. Hovmller, T. Zhou, and T. Ohlson, Acta Crystallogr 
D Biol Crystallogr 58, 768 (2002). 

[5] S. C. Lovell, I. W. Davis, W. B. Arendall, P. I. W. 
de Bakker, J. M. Word, M. G. Prisant, J. S. Richard- 
son, and D. C. Richardson, Proteins 50, 437 (2003). 

[6] V. Muoz and L. Serrano, Proteins 20, 301 (1994). 

[7] E. J. Milner- White, J Mol Biol 216, 386 (1990). 

[8] B. K. Ho and R. Brasseur, BMC Struct Biol 5, 14 (2005). 

[9] H. M. Berman, J. Westbrook, Z. Feng, G. GilUland, T. N. 
Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, 
Nucleic Acids Res 28, 235 (2000). 
[10] P. J. A. Cock, T. Antao, J. T. Chang, B. A. Chapman, 
C. J. Cox, A. Dalke, I. Friedberg, T. Hamelryck, F. Kauff, 
B. Wilczynski, et al., Bioiriformatics 25, 1422 (2009). 



7 



[11] A. Bornot, C. Etchebest, and A. G. de Brevern, Proteins 
79, 839 (2011). 

[12] R. P. Joostcn, T. A. H. tc Beck, E. Kricgcr, M. L. Hckkcl- 
man, R. W. W. Hooft, R. Schneider, C. Sander, and 
G. Vriend, Nucleic Acids Res 39, D411 (2011). 

[13] M. T. Zimmermann, A. Kloczkowski, and R. L. Jernigan, 
BMC Bioinformatics 12, 264 (2011). 

[14] A. Lewit-Bentley and S. Rty, Curr Opin Struct Biol 10, 



637 (2000). 

[15] W. Kabsch and C. Sander, Biopolymers 22, 2577 (1983). 

[16] M. S. Hasson, A. Muscatc, M. J. McLeish, L. S. 
Polovnikova, J. A. Gerlt, G. L. Kenyon, G. A. Petsko, 
and D. Ringe, Biochemistry 37, 9918 (1998). 

[17] C. Combet, C. Blanchet, C. Geourjon, and G. Delage, 
Trends in Biochemical Sciences 25, 147 (2000). 



